Optimal Feature Based Density Clustering for Outlier Detection in Multivariate Data

ثبت نشده
چکیده

Efficient outlier detection in a large-sized big data environment incurs much of complexity in processing the information and to handle it in a proficient way. For segregating outliers from those normal data items, many of the prevailing methodologies experiences complexity in accordance with the features involved in every single attribute. On recognizing appropriate features associated the characteristics of a data gets defined. The necessity of analyzing all sort of feature escalates the processing time along with memory consumption. As a way out of all of these issues, this paper proposes Optimal Feature based Outlier Factor Model (OFOFM), an effectual outlier detection approach accompanied with prior feature optimization strategy. Initially, preprocessing stage formats all data instances available in the dataset utilized and deployed in a SPARK architecture. Furthermore, an Ant Colony Optimization gets employed in determining for an optimal set of features among the wholesome feature set available. Generalized Sequence Pattern methodology gets employed for formulating tightly coupled sequential patterns that exclude outliers on the basis of a feature set. Moreover, a density based clustering approach involves in clustering those sequentially associated patterns as a means of forming densely associated clusters. As a final point, Local Outlier Factor based outlier detection methodology involves in discriminating outliers completely from that information processed so far. The efficacy of OF-OFM regarding outlier detection gets exemplified by evaluating Area Under Curve (AUC), CPU utilization time, execution time, detection accuracy and memory consumption against existing outlier detection methodologies. OF-OFM evidently proves to be efficacious than other approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Outlier Detection Using Extreme Learning Machines Based on Quantum Fuzzy C-Means

One of the most important concerns of a data miner is always to have accurate and error-free data. Data that does not contain human errors and whose records are full and contain correct data. In this paper, a new learning model based on an extreme learning machine neural network is proposed for outlier detection. The function of neural networks depends on various parameters such as the structur...

متن کامل

Identification of outliers types in multivariate time series using genetic algorithm

Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...

متن کامل

A Spectral Clustering Based Outlier Detection Technique

Outlier detection shows its increasingly high practical value in many application areas such as intrusion detection, fraud detection, discovery of criminal activities in electronic commerce and so on. Many techniques have been developed for outlier detection, including distribution-based outlier detection algorithm, depth-based outlier detection algorithm, distance-based outlier detection algor...

متن کامل

A Framework for Outlier Detection in Geographic Spatial Data

Outlier detection is very interesting, useful and challenging problem in the field of data mining. Because of sparse data clustering algorithm which are based on distance will not work to find outliers in spatial data. Problem of finding irregular feature in spatial data need to be explore. Many existing approaches have been proposed to overcome the problem of outlier detection in spatial Geogr...

متن کامل

Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines

In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017